Fast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams

نویسنده

Hamid Mousavi

چکیده

Equi-depth histograms represent a fundamental synopsis widely used in both database and data stream applications, as they provide the cornerstone of many techniques such as query optimization, approximate query answering, distribution fitting, and parallel database partitioning. Equi-depth histograms try to partition a sequence of data in a way that every part has the same number of data items. In this paper, we present a new algorithm to estimate equi-depth histograms for high speed data streams over sliding windows. While many previous methods were based on quantile computations, we propose a new method called BAr Splitting Histogram (BASH) that provides an expected ε-approximate solution to compute the equi-depth histogram. Extensive experiments show that BASH is at least four times faster than one of the best existing approaches, while achieving the same accuracy and using less memory. The experimental results also indicate that BASH is more stable on data affected by frequent concept shifts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantiles and Equi-depth Histograms over Streams

متن کامل

Equi-depth Histogram Construction for Big Data with Quality Guarantees

The amount of data generated and stored in cloud systems has been increasing exponentially. The examples of data include user generated data, machine generated data as well as data crawled from the Internet. There have been several frameworks with proven efficiency to store and process the petabyte scale data such as Apache Hadoop, HDFS and several NoSQL frameworks. These systems have been wide...

متن کامل

Dynamic Maintenance of Wavelet-Based Histograms

In this paper, we introduce an e cient method for the dynamic maintenance of wavelet-based histograms (and other transform-based histograms). Previous work has shown that wavelet-based histograms provide more accurate selectivity estimation than traditional histograms, such as equi-depth histograms. But since wavelet-based histograms are built by a nontrivial mathematical procedure, namely, wav...

متن کامل

A Survey of Synopsis Construction in Data Streams

The large volume of data streams poses unique space and time constraints on the computation process. Many query processing, database operations, and mining algorithms require efficient execution which can be difficult to achieve with a fast data stream. In many cases, it may be acceptable to generate approximate solutions for such problems. In recent years a number of synopsis structures have b...

متن کامل

An Efficient Parallel Algorithm for High Dimensional Similarity Join - Parallel Processing Symposium, 1998, and Symposium on Parallel and Distributed Processing 1998. 19

Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Fast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams

نویسنده

چکیده

منابع مشابه

Quantiles and Equi-depth Histograms over Streams

Equi-depth Histogram Construction for Big Data with Quality Guarantees

Dynamic Maintenance of Wavelet-Based Histograms

A Survey of Synopsis Construction in Data Streams

An Efficient Parallel Algorithm for High Dimensional Similarity Join - Parallel Processing Symposium, 1998, and Symposium on Parallel and Distributed Processing 1998. 19

عنوان ژورنال:

اشتراک گذاری